[fix](parquet) Fix wrong encoding for parquet page v2#63305
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run external |
|
run buildall |
TPC-H: Total hot run time: 31130 ms |
TPC-DS: Total hot run time: 169421 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
### What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary: Add a BE unit test for rejecting Parquet DATA_PAGE_V2 chunks that contain non-dictionary data page encodings in encoding_stats.
### Release note
None
### Check List (For Author)
- Test: Unit Test
- Added BE UT for RowGroupReader::is_dictionary_encoded DATA_PAGE_V2 encoding_stats handling. Attempted ./run-be-ut.sh --run --filter=ParquetThriftReaderTest.is_dictionary_encoded_rejects_plain_data_page_v2, but local environment has JAVA_HOME on JDK 11 and run-be-ut.sh requires JDK 17. Attempted build-support/clang-format.sh, but local environment lacks llvm@16.
- Behavior changed: No
- Does this need documentation: No
|
run buildall |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31243 ms |
TPC-DS: Total hot run time: 168762 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
1 similar comment
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
Review result: request changes.
Critical checkpoint conclusions:
- Goal/test proof: The production change aims to reject non-dictionary DATA_PAGE_V2 pages for Parquet dictionary filtering, and the main code change is directionally correct, but the added BE unit test does not compile because it calls a private method directly.
- Scope/focus: The production change is small and focused. The test access pattern needs to be corrected without broadening behavior unnecessarily.
- Concurrency/lifecycle/config/compatibility: No new concurrency, special lifecycle, configuration, storage format, or FE-BE protocol compatibility concerns found.
- Parallel code paths: There is a similar Iceberg dictionary-encoding helper/test path that already covers DATA_PAGE_V2; no additional blocking issue found beyond the new Parquet test compile failure.
- Special checks/error handling/memory/transactions/data writes: No new concerns found in the changed production logic.
- Tests/results: Regression coverage was added, but BE unit coverage is currently broken at compile time, so it cannot validate the fix until the private-access issue is resolved.
- Observability/performance: No additional observability or performance issue found for this small metadata check.
User focus: No additional user-provided review focus was present.
| position_delete_ctx, lazy_read_ctx, nullptr, column_ids, | ||
| filter_column_ids); | ||
|
|
||
| EXPECT_FALSE(row_group_reader.is_dictionary_encoded(column_metadata)); |
There was a problem hiding this comment.
This test will not compile as written: RowGroupReader::is_dictionary_encoded is declared in the private section of vparquet_group_reader.h, and this test file does not use a friend/test helper or a private visibility override. Because the new BE UT directly calls row_group_reader.is_dictionary_encoded(...), the target fails at compile time instead of validating the DATA_PAGE_V2 fix. Please expose this logic through a testable helper (or test through the public reader path) before adding the assertion.
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)